AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Visual Scene Understanding

# Visual Scene Understanding

Distill Any Depth Small Hf
MIT
Distill-Any-Depth is a SOTA monocular depth estimation model trained based on knowledge distillation algorithms, capable of efficient and accurate depth estimation.
3D Vision Transformers
D
xingyang1
1,214
3
Llava SpaceSGG
Apache-2.0
LLaVA-SpaceSGG is a visual question-answering model based on LLaVA-v1.5-13b, focusing on scene graph generation tasks. It can understand image content and generate structured scene descriptions.
Text-to-Image English
L
wumengyangok
36
0
Dpt Dinov2 Giant Nyu
Apache-2.0
DPT model using DINOv2 as the backbone network for monocular depth estimation tasks
3D Vision Transformers
D
facebook
29
1
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase